<!DOCTYPE html>
<HTML lang = "en">
<HEAD>
  <meta charset="UTF-8"/>
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
  <title>Introduction to Scientific Machine Learning through Physics-Informed Neural Networks</title>
  

  <script type="text/x-mathjax-config">
    MathJax.Hub.Config({
      tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]},
      TeX: { equationNumbers: { autoNumber: "AMS" } }
    });
  </script>

  <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
  </script>

  
<style>
pre.hljl {
    border: 1px solid #ccc;
    margin: 5px;
    padding: 5px;
    overflow-x: auto;
    color: rgb(68,68,68); background-color: rgb(251,251,251); }
pre.hljl > span.hljl-t { }
pre.hljl > span.hljl-w { }
pre.hljl > span.hljl-e { }
pre.hljl > span.hljl-eB { }
pre.hljl > span.hljl-o { }
pre.hljl > span.hljl-k { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kc { color: rgb(59,151,46); font-style: italic; }
pre.hljl > span.hljl-kd { color: rgb(214,102,97); font-style: italic; }
pre.hljl > span.hljl-kn { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kp { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kr { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-kt { color: rgb(148,91,176); font-weight: bold; }
pre.hljl > span.hljl-n { }
pre.hljl > span.hljl-na { }
pre.hljl > span.hljl-nb { }
pre.hljl > span.hljl-nbp { }
pre.hljl > span.hljl-nc { }
pre.hljl > span.hljl-ncB { }
pre.hljl > span.hljl-nd { color: rgb(214,102,97); }
pre.hljl > span.hljl-ne { }
pre.hljl > span.hljl-neB { }
pre.hljl > span.hljl-nf { color: rgb(66,102,213); }
pre.hljl > span.hljl-nfm { color: rgb(66,102,213); }
pre.hljl > span.hljl-np { }
pre.hljl > span.hljl-nl { }
pre.hljl > span.hljl-nn { }
pre.hljl > span.hljl-no { }
pre.hljl > span.hljl-nt { }
pre.hljl > span.hljl-nv { }
pre.hljl > span.hljl-nvc { }
pre.hljl > span.hljl-nvg { }
pre.hljl > span.hljl-nvi { }
pre.hljl > span.hljl-nvm { }
pre.hljl > span.hljl-l { }
pre.hljl > span.hljl-ld { color: rgb(148,91,176); font-style: italic; }
pre.hljl > span.hljl-s { color: rgb(201,61,57); }
pre.hljl > span.hljl-sa { color: rgb(201,61,57); }
pre.hljl > span.hljl-sb { color: rgb(201,61,57); }
pre.hljl > span.hljl-sc { color: rgb(201,61,57); }
pre.hljl > span.hljl-sd { color: rgb(201,61,57); }
pre.hljl > span.hljl-sdB { color: rgb(201,61,57); }
pre.hljl > span.hljl-sdC { color: rgb(201,61,57); }
pre.hljl > span.hljl-se { color: rgb(59,151,46); }
pre.hljl > span.hljl-sh { color: rgb(201,61,57); }
pre.hljl > span.hljl-si { }
pre.hljl > span.hljl-so { color: rgb(201,61,57); }
pre.hljl > span.hljl-sr { color: rgb(201,61,57); }
pre.hljl > span.hljl-ss { color: rgb(201,61,57); }
pre.hljl > span.hljl-ssB { color: rgb(201,61,57); }
pre.hljl > span.hljl-nB { color: rgb(59,151,46); }
pre.hljl > span.hljl-nbB { color: rgb(59,151,46); }
pre.hljl > span.hljl-nfB { color: rgb(59,151,46); }
pre.hljl > span.hljl-nh { color: rgb(59,151,46); }
pre.hljl > span.hljl-ni { color: rgb(59,151,46); }
pre.hljl > span.hljl-nil { color: rgb(59,151,46); }
pre.hljl > span.hljl-noB { color: rgb(59,151,46); }
pre.hljl > span.hljl-oB { color: rgb(102,102,102); font-weight: bold; }
pre.hljl > span.hljl-ow { color: rgb(102,102,102); font-weight: bold; }
pre.hljl > span.hljl-p { }
pre.hljl > span.hljl-c { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-ch { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cm { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cp { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cpB { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-cs { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-csB { color: rgb(153,153,119); font-style: italic; }
pre.hljl > span.hljl-g { }
pre.hljl > span.hljl-gd { }
pre.hljl > span.hljl-ge { }
pre.hljl > span.hljl-geB { }
pre.hljl > span.hljl-gh { }
pre.hljl > span.hljl-gi { }
pre.hljl > span.hljl-go { }
pre.hljl > span.hljl-gp { }
pre.hljl > span.hljl-gs { }
pre.hljl > span.hljl-gsB { }
pre.hljl > span.hljl-gt { }
</style>



  <style type="text/css">
  @font-face {
  font-style: normal;
  font-weight: 300;
}
@font-face {
  font-style: normal;
  font-weight: 400;
}
@font-face {
  font-style: normal;
  font-weight: 600;
}
html {
  font-family: sans-serif; /* 1 */
  -ms-text-size-adjust: 100%; /* 2 */
  -webkit-text-size-adjust: 100%; /* 2 */
}
body {
  margin: 0;
}
article,
aside,
details,
figcaption,
figure,
footer,
header,
hgroup,
main,
menu,
nav,
section,
summary {
  display: block;
}
audio,
canvas,
progress,
video {
  display: inline-block; /* 1 */
  vertical-align: baseline; /* 2 */
}
audio:not([controls]) {
  display: none;
  height: 0;
}
[hidden],
template {
  display: none;
}
a:active,
a:hover {
  outline: 0;
}
abbr[title] {
  border-bottom: 1px dotted;
}
b,
strong {
  font-weight: bold;
}
dfn {
  font-style: italic;
}
h1 {
  font-size: 2em;
  margin: 0.67em 0;
}
mark {
  background: #ff0;
  color: #000;
}
small {
  font-size: 80%;
}
sub,
sup {
  font-size: 75%;
  line-height: 0;
  position: relative;
  vertical-align: baseline;
}
sup {
  top: -0.5em;
}
sub {
  bottom: -0.25em;
}
img {
  border: 0;
}
svg:not(:root) {
  overflow: hidden;
}
figure {
  margin: 1em 40px;
}
hr {
  -moz-box-sizing: content-box;
  box-sizing: content-box;
  height: 0;
}
pre {
  overflow: auto;
}
code,
kbd,
pre,
samp {
  font-family: monospace, monospace;
  font-size: 1em;
}
button,
input,
optgroup,
select,
textarea {
  color: inherit; /* 1 */
  font: inherit; /* 2 */
  margin: 0; /* 3 */
}
button {
  overflow: visible;
}
button,
select {
  text-transform: none;
}
button,
html input[type="button"], /* 1 */
input[type="reset"],
input[type="submit"] {
  -webkit-appearance: button; /* 2 */
  cursor: pointer; /* 3 */
}
button[disabled],
html input[disabled] {
  cursor: default;
}
button::-moz-focus-inner,
input::-moz-focus-inner {
  border: 0;
  padding: 0;
}
input {
  line-height: normal;
}
input[type="checkbox"],
input[type="radio"] {
  box-sizing: border-box; /* 1 */
  padding: 0; /* 2 */
}
input[type="number"]::-webkit-inner-spin-button,
input[type="number"]::-webkit-outer-spin-button {
  height: auto;
}
input[type="search"] {
  -webkit-appearance: textfield; /* 1 */
  -moz-box-sizing: content-box;
  -webkit-box-sizing: content-box; /* 2 */
  box-sizing: content-box;
}
input[type="search"]::-webkit-search-cancel-button,
input[type="search"]::-webkit-search-decoration {
  -webkit-appearance: none;
}
fieldset {
  border: 1px solid #c0c0c0;
  margin: 0 2px;
  padding: 0.35em 0.625em 0.75em;
}
legend {
  border: 0; /* 1 */
  padding: 0; /* 2 */
}
textarea {
  overflow: auto;
}
optgroup {
  font-weight: bold;
}
table {
  font-family: monospace, monospace;
  font-size : 0.8em;
  border-collapse: collapse;
  border-spacing: 0;
}
td,
th {
  padding: 0;
}
thead th {
    border-bottom: 1px solid black;
    background-color: white;
}
tr:nth-child(odd){
  background-color: rgb(248,248,248);
}


/*
* Skeleton V2.0.4
* Copyright 2014, Dave Gamache
* www.getskeleton.com
* Free to use under the MIT license.
* http://www.opensource.org/licenses/mit-license.php
* 12/29/2014
*/
.container {
  position: relative;
  width: 100%;
  max-width: 960px;
  margin: 0 auto;
  padding: 0 20px;
  box-sizing: border-box; }
.column,
.columns {
  width: 100%;
  float: left;
  box-sizing: border-box; }
@media (min-width: 400px) {
  .container {
    width: 85%;
    padding: 0; }
}
@media (min-width: 550px) {
  .container {
    width: 80%; }
  .column,
  .columns {
    margin-left: 4%; }
  .column:first-child,
  .columns:first-child {
    margin-left: 0; }

  .one.column,
  .one.columns                    { width: 4.66666666667%; }
  .two.columns                    { width: 13.3333333333%; }
  .three.columns                  { width: 22%;            }
  .four.columns                   { width: 30.6666666667%; }
  .five.columns                   { width: 39.3333333333%; }
  .six.columns                    { width: 48%;            }
  .seven.columns                  { width: 56.6666666667%; }
  .eight.columns                  { width: 65.3333333333%; }
  .nine.columns                   { width: 74.0%;          }
  .ten.columns                    { width: 82.6666666667%; }
  .eleven.columns                 { width: 91.3333333333%; }
  .twelve.columns                 { width: 100%; margin-left: 0; }

  .one-third.column               { width: 30.6666666667%; }
  .two-thirds.column              { width: 65.3333333333%; }

  .one-half.column                { width: 48%; }

  /* Offsets */
  .offset-by-one.column,
  .offset-by-one.columns          { margin-left: 8.66666666667%; }
  .offset-by-two.column,
  .offset-by-two.columns          { margin-left: 17.3333333333%; }
  .offset-by-three.column,
  .offset-by-three.columns        { margin-left: 26%;            }
  .offset-by-four.column,
  .offset-by-four.columns         { margin-left: 34.6666666667%; }
  .offset-by-five.column,
  .offset-by-five.columns         { margin-left: 43.3333333333%; }
  .offset-by-six.column,
  .offset-by-six.columns          { margin-left: 52%;            }
  .offset-by-seven.column,
  .offset-by-seven.columns        { margin-left: 60.6666666667%; }
  .offset-by-eight.column,
  .offset-by-eight.columns        { margin-left: 69.3333333333%; }
  .offset-by-nine.column,
  .offset-by-nine.columns         { margin-left: 78.0%;          }
  .offset-by-ten.column,
  .offset-by-ten.columns          { margin-left: 86.6666666667%; }
  .offset-by-eleven.column,
  .offset-by-eleven.columns       { margin-left: 95.3333333333%; }

  .offset-by-one-third.column,
  .offset-by-one-third.columns    { margin-left: 34.6666666667%; }
  .offset-by-two-thirds.column,
  .offset-by-two-thirds.columns   { margin-left: 69.3333333333%; }

  .offset-by-one-half.column,
  .offset-by-one-half.columns     { margin-left: 52%; }

}
html {
  font-size: 62.5%; }
body {
  font-size: 1.5em; /* currently ems cause chrome bug misinterpreting rems on body element */
  line-height: 1.6;
  font-weight: 400;
  font-family: "Raleway", "HelveticaNeue", "Helvetica Neue", Helvetica, Arial, sans-serif;
  color: #222; }
h1, h2, h3, h4, h5, h6 {
  margin-top: 0;
  margin-bottom: 2rem;
  font-weight: 300; }
h1 { font-size: 3.6rem; line-height: 1.2;  letter-spacing: -.1rem;}
h2 { font-size: 3.4rem; line-height: 1.25; letter-spacing: -.1rem; }
h3 { font-size: 3.2rem; line-height: 1.3;  letter-spacing: -.1rem; }
h4 { font-size: 2.8rem; line-height: 1.35; letter-spacing: -.08rem; }
h5 { font-size: 2.4rem; line-height: 1.5;  letter-spacing: -.05rem; }
h6 { font-size: 1.5rem; line-height: 1.6;  letter-spacing: 0; }

p {
  margin-top: 0; }
a {
  color: #1EAEDB; }
a:hover {
  color: #0FA0CE; }
.button,
button,
input[type="submit"],
input[type="reset"],
input[type="button"] {
  display: inline-block;
  height: 38px;
  padding: 0 30px;
  color: #555;
  text-align: center;
  font-size: 11px;
  font-weight: 600;
  line-height: 38px;
  letter-spacing: .1rem;
  text-transform: uppercase;
  text-decoration: none;
  white-space: nowrap;
  background-color: transparent;
  border-radius: 4px;
  border: 1px solid #bbb;
  cursor: pointer;
  box-sizing: border-box; }
.button:hover,
button:hover,
input[type="submit"]:hover,
input[type="reset"]:hover,
input[type="button"]:hover,
.button:focus,
button:focus,
input[type="submit"]:focus,
input[type="reset"]:focus,
input[type="button"]:focus {
  color: #333;
  border-color: #888;
  outline: 0; }
.button.button-primary,
button.button-primary,
input[type="submit"].button-primary,
input[type="reset"].button-primary,
input[type="button"].button-primary {
  color: #FFF;
  background-color: #33C3F0;
  border-color: #33C3F0; }
.button.button-primary:hover,
button.button-primary:hover,
input[type="submit"].button-primary:hover,
input[type="reset"].button-primary:hover,
input[type="button"].button-primary:hover,
.button.button-primary:focus,
button.button-primary:focus,
input[type="submit"].button-primary:focus,
input[type="reset"].button-primary:focus,
input[type="button"].button-primary:focus {
  color: #FFF;
  background-color: #1EAEDB;
  border-color: #1EAEDB; }
input[type="email"],
input[type="number"],
input[type="search"],
input[type="text"],
input[type="tel"],
input[type="url"],
input[type="password"],
textarea,
select {
  height: 38px;
  padding: 6px 10px; /* The 6px vertically centers text on FF, ignored by Webkit */
  background-color: #fff;
  border: 1px solid #D1D1D1;
  border-radius: 4px;
  box-shadow: none;
  box-sizing: border-box; }
/* Removes awkward default styles on some inputs for iOS */
input[type="email"],
input[type="number"],
input[type="search"],
input[type="text"],
input[type="tel"],
input[type="url"],
input[type="password"],
textarea {
  -webkit-appearance: none;
     -moz-appearance: none;
          appearance: none; }
textarea {
  min-height: 65px;
  padding-top: 6px;
  padding-bottom: 6px; }
input[type="email"]:focus,
input[type="number"]:focus,
input[type="search"]:focus,
input[type="text"]:focus,
input[type="tel"]:focus,
input[type="url"]:focus,
input[type="password"]:focus,
textarea:focus,
select:focus {
  border: 1px solid #33C3F0;
  outline: 0; }
label,
legend {
  display: block;
  margin-bottom: .5rem;
  font-weight: 600; }
fieldset {
  padding: 0;
  border-width: 0; }
input[type="checkbox"],
input[type="radio"] {
  display: inline; }
label > .label-body {
  display: inline-block;
  margin-left: .5rem;
  font-weight: normal; }
ul {
  list-style: circle; }
ol {
  list-style: decimal; }
ul ul,
ul ol,
ol ol,
ol ul {
  margin: 1.5rem 0 1.5rem 3rem;
  font-size: 90%; }
li > p {margin : 0;}
th,
td {
  padding: 12px 15px;
  text-align: left;
  border-bottom: 1px solid #E1E1E1; }
th:first-child,
td:first-child {
  padding-left: 0; }
th:last-child,
td:last-child {
  padding-right: 0; }
button,
.button {
  margin-bottom: 1rem; }
input,
textarea,
select,
fieldset {
  margin-bottom: 1.5rem; }
pre,
blockquote,
dl,
figure,
table,
p,
ul,
ol,
form {
  margin-bottom: 1.0rem; }
.u-full-width {
  width: 100%;
  box-sizing: border-box; }
.u-max-full-width {
  max-width: 100%;
  box-sizing: border-box; }
.u-pull-right {
  float: right; }
.u-pull-left {
  float: left; }
hr {
  margin-top: 3rem;
  margin-bottom: 3.5rem;
  border-width: 0;
  border-top: 1px solid #E1E1E1; }
.container:after,
.row:after,
.u-cf {
  content: "";
  display: table;
  clear: both; }

pre {
  display: block;
  padding: 9.5px;
  margin: 0 0 10px;
  font-size: 13px;
  line-height: 1.42857143;
  word-break: break-all;
  word-wrap: break-word;
  border: 1px solid #ccc;
  border-radius: 4px;
}

pre.hljl {
  margin: 0 0 10px;
  display: block;
  background: #f5f5f5;
  border-radius: 4px;
  padding : 5px;
}

pre.output {
  background: #ffffff;
}

pre.code {
  background: #ffffff;
}

pre.julia-error {
  color : red
}

code,
kbd,
pre,
samp {
  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
  font-size: 0.9em;
}


@media (min-width: 400px) {}
@media (min-width: 550px) {}
@media (min-width: 750px) {}
@media (min-width: 1000px) {}
@media (min-width: 1200px) {}

h1.title {margin-top : 20px}
img {max-width : 100%}
div.title {text-align: center;}

  </style>
</HEAD>

<BODY>
  <div class ="container">
    <div class = "row">
      <div class = "col-md-12 twelve columns">
        <div class="title">
          <h1 class="title">Introduction to Scientific Machine Learning through Physics-Informed Neural Networks</h1>
          <h5>Chris Rackauckas</h5>
          <h5>September 8th, 2020</h5>
        </div>

        <p>Here we will start to dig into what scientific machine learning is all about by looking at physics-informed neural networks. Let&#39;s start by understanding what a neural network really is, why they are used, and what kinds of problems that they solve, and then we will use this understanding of a neural network to see how to solve ordinary differential equations with neural networks. For there, we will use this method to regularize neural networks with physical equations, the aforementioned physics-informed neural network, and see how to define neural network architectures that satisfy physical constraints to improve the training process.</p>
<h2>Getting Started with Machine Learning: Adding Flux</h2>
<p>To add Flux.jl we would do:</p>



<pre class='hljl'>
<span class='hljl-p'>]</span><span class='hljl-n'>add</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span>
</pre>


<p>To then use the package we will then use the <code>using</code> command:</p>


<pre class='hljl'>
<span class='hljl-k'>using</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span>
</pre>



<p>If you prefer to namespace all commands &#40;like is normally done in Python, i.e. <code>Flux.gradient</code> instead of <code>gradient</code>&#41;, you can use the command:</p>



<pre class='hljl'>
<span class='hljl-k'>import</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span>
</pre>


<p>Note that the installation and precompilation of these packages will occur at the <code>add</code> and first <code>using</code> phases, so they may take awhile &#40;subsequent uses will utilize the precompiled form and take a lot less time&#33;&#41;</p>
<h2>What is a Neural Network?</h2>
<p>A neural network is a function:</p>
<p class="math">\[
\text{NN}(x) = W_3\sigma_2(W_2\sigma_1(W_1x + b_1) + b_2) + b_3
\]</p>
<p>where we can change the number of layers &#40;<code>&#40;W_i,b_i&#41;</code>&#41; as necesary. Let&#39;s assume we want to approximate some <span class="math">$R^{10} \rightarrow R^5$</span> function. To do this we need to make sure that we start with 10 inputs and arrive at 5 outputs. If we want a bigger middle layer for example, we can do something like &#40;10,32,32,5&#41;. Size changing occurs at the site of the matrix multiplication, which means that we want a 32x10 matrix, then a 32x32 matrix, and finally a 5x32 matrix. This neural network would look like:</p>


<pre class='hljl'>
<span class='hljl-n'>W</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-p'>[</span><span class='hljl-nf'>randn</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>10</span><span class='hljl-p'>),</span><span class='hljl-nf'>randn</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>),</span><span class='hljl-nf'>randn</span><span class='hljl-p'>(</span><span class='hljl-ni'>5</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>)]</span><span class='hljl-t'>
</span><span class='hljl-n'>b</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-p'>[</span><span class='hljl-nf'>zeros</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>),</span><span class='hljl-nf'>zeros</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>),</span><span class='hljl-nf'>zeros</span><span class='hljl-p'>(</span><span class='hljl-ni'>5</span><span class='hljl-p'>)]</span>
</pre>


<pre class="output">
3-element Array&#123;Array&#123;Float64,1&#125;,1&#125;:
 &#91;0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0&#93;
 &#91;0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0&#93;
 &#91;0.0, 0.0, 0.0, 0.0, 0.0&#93;
</pre>



<pre class='hljl'>
<span class='hljl-nf'>simpleNN</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>W</span><span class='hljl-p'>[</span><span class='hljl-ni'>3</span><span class='hljl-p'>]</span><span class='hljl-oB'>*</span><span class='hljl-n'>tanh</span><span class='hljl-oB'>.</span><span class='hljl-p'>(</span><span class='hljl-n'>W</span><span class='hljl-p'>[</span><span class='hljl-ni'>2</span><span class='hljl-p'>]</span><span class='hljl-oB'>*</span><span class='hljl-n'>tanh</span><span class='hljl-oB'>.</span><span class='hljl-p'>(</span><span class='hljl-n'>W</span><span class='hljl-p'>[</span><span class='hljl-ni'>1</span><span class='hljl-p'>]</span><span class='hljl-oB'>*</span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>+</span><span class='hljl-t'> </span><span class='hljl-n'>b</span><span class='hljl-p'>[</span><span class='hljl-ni'>1</span><span class='hljl-p'>])</span><span class='hljl-t'> </span><span class='hljl-oB'>+</span><span class='hljl-t'> </span><span class='hljl-n'>b</span><span class='hljl-p'>[</span><span class='hljl-ni'>2</span><span class='hljl-p'>])</span><span class='hljl-t'> </span><span class='hljl-oB'>+</span><span class='hljl-t'> </span><span class='hljl-n'>b</span><span class='hljl-p'>[</span><span class='hljl-ni'>3</span><span class='hljl-p'>]</span><span class='hljl-t'>
</span><span class='hljl-nf'>simpleNN</span><span class='hljl-p'>(</span><span class='hljl-nf'>rand</span><span class='hljl-p'>(</span><span class='hljl-ni'>10</span><span class='hljl-p'>))</span>
</pre>


<pre class="output">
5-element Array&#123;Float64,1&#125;:
  9.19670526155091
 -4.35567103243751
  4.6071305070688
 -6.3624003407921474
  8.705698375280907
</pre>


<p>This is our direct definition of a neural network. Notice that we choose to use <code>tanh</code> as our <strong>activation function</strong> between the layers.</p>
<h3>Defining Neural Networks with Flux.jl</h3>
<p>One of the main deep learning libraries in Julia is Flux.jl. Flux is an interesting library for scientific machine learning because it is built on top of language-wide <strong>automatic differentiation</strong> libraries, giving rise to a programming paradigm known as <strong>differentiable programming</strong>, which means that one can write a program in a manner that it has easily accessible fast derivatives. However, due to being built on a differentiable programming base, the underlying functionality is simply standard Julia code,</p>
<p>To learn how to use the library, consult the documentation. A Google search will bring up the <a href="https://github.com/FluxML/Flux.jl">Flux.jl Github repository</a>. From there, the blue link on the README brings you to <a href="https://fluxml.ai/Flux.jl/stable/">the package documentation</a>. This is common through Julia so it&#39;s a good habit to learn&#33;</p>
<p>In the documentation you will find that the way a neural network is defined is through a <code>Chain</code> of layers. A <code>Dense</code> layer is the kind we defined above, which is given by an input size, an output size, and an activation function. For example, the following recreates the neural network that we had above:</p>


<pre class='hljl'>
<span class='hljl-k'>using</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span><span class='hljl-t'>
</span><span class='hljl-n'>NN2</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>Chain</span><span class='hljl-p'>(</span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>10</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-n'>tanh</span><span class='hljl-p'>),</span><span class='hljl-t'>
           </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-n'>tanh</span><span class='hljl-p'>),</span><span class='hljl-t'>
           </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>5</span><span class='hljl-p'>))</span><span class='hljl-t'>
</span><span class='hljl-nf'>NN2</span><span class='hljl-p'>(</span><span class='hljl-nf'>rand</span><span class='hljl-p'>(</span><span class='hljl-ni'>10</span><span class='hljl-p'>))</span>
</pre>


<pre class="output">
5-element Array&#123;Float32,1&#125;:
  0.30022457
  0.37106496
  0.10870228
 -0.7765441
  0.23717026
</pre>


<p>Notice that Flux.jl as a library is written in pure Julia, which means that every piece of this syntax is just sugar over some Julia code that we can specialize ourselves &#40;this is the advantage of having a language fast enough for the implementation of the library and the use of the library&#33;&#41;</p>
<p>For example, the activation function is just a scalar Julia function. If we wanted to replace it by something like the quadratic function, we can just use an <strong>anonymous function</strong> to define the scalar function we would like to use:</p>


<pre class='hljl'>
<span class='hljl-n'>NN3</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>Chain</span><span class='hljl-p'>(</span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>10</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-n'>x</span><span class='hljl-oB'>-&gt;</span><span class='hljl-n'>x</span><span class='hljl-oB'>^</span><span class='hljl-ni'>2</span><span class='hljl-p'>),</span><span class='hljl-t'>
            </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-n'>x</span><span class='hljl-oB'>-&gt;</span><span class='hljl-nf'>max</span><span class='hljl-p'>(</span><span class='hljl-ni'>0</span><span class='hljl-p'>,</span><span class='hljl-n'>x</span><span class='hljl-p'>)),</span><span class='hljl-t'>
            </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>5</span><span class='hljl-p'>))</span><span class='hljl-t'>
</span><span class='hljl-nf'>NN3</span><span class='hljl-p'>(</span><span class='hljl-nf'>rand</span><span class='hljl-p'>(</span><span class='hljl-ni'>10</span><span class='hljl-p'>))</span>
</pre>


<pre class="output">
5-element Array&#123;Float32,1&#125;:
  0.042253576
 -0.06601792
  0.089549884
  0.09319328
 -0.109868415
</pre>


<p>The second activation function there is what&#39;s known as a <code>relu</code>. A <code>relu</code> can be good to use because it&#39;s an exceptionally operation and satisfies a form of the UAT. However, a downside is that its derivative is not continuous, which could impact the numerical properties of some algorithms, and thus it&#39;s widely used throughout standard machine learning but we&#39;ll see reasons why it may be disadvantageous in some cases in scientific machine learning.</p>
<h3>Digging into the Construction of a Neural Network Library</h3>
<p>Again, as mentioned before, this neural network <code>NN2</code> is simply a function:</p>


<pre class='hljl'>
<span class='hljl-nf'>simpleNN</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>W</span><span class='hljl-p'>[</span><span class='hljl-ni'>3</span><span class='hljl-p'>]</span><span class='hljl-oB'>*</span><span class='hljl-n'>tanh</span><span class='hljl-oB'>.</span><span class='hljl-p'>(</span><span class='hljl-n'>W</span><span class='hljl-p'>[</span><span class='hljl-ni'>2</span><span class='hljl-p'>]</span><span class='hljl-oB'>*</span><span class='hljl-n'>tanh</span><span class='hljl-oB'>.</span><span class='hljl-p'>(</span><span class='hljl-n'>W</span><span class='hljl-p'>[</span><span class='hljl-ni'>1</span><span class='hljl-p'>]</span><span class='hljl-oB'>*</span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>+</span><span class='hljl-t'> </span><span class='hljl-n'>b</span><span class='hljl-p'>[</span><span class='hljl-ni'>1</span><span class='hljl-p'>])</span><span class='hljl-t'> </span><span class='hljl-oB'>+</span><span class='hljl-t'> </span><span class='hljl-n'>b</span><span class='hljl-p'>[</span><span class='hljl-ni'>2</span><span class='hljl-p'>])</span><span class='hljl-t'> </span><span class='hljl-oB'>+</span><span class='hljl-t'> </span><span class='hljl-n'>b</span><span class='hljl-p'>[</span><span class='hljl-ni'>3</span><span class='hljl-p'>]</span>
</pre>


<pre class="output">
simpleNN &#40;generic function with 1 method&#41;
</pre>


<p>Let&#39;s dig into the library and see how that&#39;s represented and really understand the construction of a deep learning library. First, let&#39;s figure out where <code>Dense</code> comes from and what it does.</p>


<pre class='hljl'>
<span class='hljl-k'>using</span><span class='hljl-t'> </span><span class='hljl-n'>InteractiveUtils</span><span class='hljl-t'>
</span><span class='hljl-nd'>@which</span><span class='hljl-t'> </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>10</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-n'>tanh</span><span class='hljl-p'>)</span>
</pre>



(::<b>Type{Flux.Dense}</b>)(in::<b>Integer</b>, out::<b>Integer</b>, σ; <i>initW, initb</i>) in Flux at <a href="file://C:/Users/accou/.julia/packages/Flux/05b38/src/layers/basic.jl" target="_blank">C:\Users\accou\.julia\packages\Flux\05b38\src\layers\basic.jl:114</a>

<p>If we go to that spot of the documentation, we find the following.</p>



<pre class='hljl'>
<span class='hljl-k'>struct</span><span class='hljl-t'> </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>{</span><span class='hljl-n'>F</span><span class='hljl-p'>,</span><span class='hljl-n'>S</span><span class='hljl-oB'>&lt;:</span><span class='hljl-n'>AbstractArray</span><span class='hljl-p'>,</span><span class='hljl-n'>T</span><span class='hljl-oB'>&lt;:</span><span class='hljl-n'>AbstractArray</span><span class='hljl-p'>}</span><span class='hljl-t'>
  </span><span class='hljl-n'>W</span><span class='hljl-oB'>::</span><span class='hljl-n'>S</span><span class='hljl-t'>
  </span><span class='hljl-n'>b</span><span class='hljl-oB'>::</span><span class='hljl-n'>T</span><span class='hljl-t'>
  </span><span class='hljl-n'>σ</span><span class='hljl-oB'>::</span><span class='hljl-n'>F</span><span class='hljl-t'>
</span><span class='hljl-k'>end</span><span class='hljl-t'>

</span><span class='hljl-k'>function</span><span class='hljl-t'> </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-kp'>in</span><span class='hljl-oB'>::</span><span class='hljl-n'>Integer</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>out</span><span class='hljl-oB'>::</span><span class='hljl-n'>Integer</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>σ</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>identity</span><span class='hljl-p'>;</span><span class='hljl-t'>
               </span><span class='hljl-n'>initW</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>glorot_uniform</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>initb</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>zeros</span><span class='hljl-p'>)</span><span class='hljl-t'>
  </span><span class='hljl-k'>return</span><span class='hljl-t'> </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-nf'>initW</span><span class='hljl-p'>(</span><span class='hljl-n'>out</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-nf'>initb</span><span class='hljl-p'>(</span><span class='hljl-n'>out</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-n'>σ</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-k'>end</span><span class='hljl-t'>

</span><span class='hljl-k'>function</span><span class='hljl-t'> </span><span class='hljl-p'>(</span><span class='hljl-n'>a</span><span class='hljl-oB'>::</span><span class='hljl-n'>Dense</span><span class='hljl-p'>)(</span><span class='hljl-n'>x</span><span class='hljl-oB'>::</span><span class='hljl-n'>AbstractArray</span><span class='hljl-p'>)</span><span class='hljl-t'>
  </span><span class='hljl-n'>W</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>b</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>σ</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>a</span><span class='hljl-oB'>.</span><span class='hljl-n'>W</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>a</span><span class='hljl-oB'>.</span><span class='hljl-n'>b</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>a</span><span class='hljl-oB'>.</span><span class='hljl-n'>σ</span><span class='hljl-t'>
  </span><span class='hljl-n'>σ</span><span class='hljl-oB'>.</span><span class='hljl-p'>(</span><span class='hljl-n'>W</span><span class='hljl-oB'>*</span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>.+</span><span class='hljl-t'> </span><span class='hljl-n'>b</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-k'>end</span>
</pre>


<p>First, <code>Dense</code> defines a struct in Julia. This struct just holds a weight matrix <code>W</code>, a bias vector <code>b</code>, and an activation function <code>σ</code>. The function called <code>Dense</code> is what&#39;s known as an <strong>outer constructor</strong> which defines how the <code>Dense</code> type is built. If you give it two integers &#40;and optionally an activation function which defaults to <code>identity</code>&#41;, then what it will do is take random initial <code>W</code> and <code>b</code> matrices &#40;according to the <code>glorot_uniform</code> distribution for <code>W</code> and <code>zeros</code> for <code>b</code>&#41;, and then it will build the type with those matrices.</p>
<p>The last portion might be new. This is known as a <strong>callable struct</strong>, or a functor. It defines the dispatch for how calls work on the struct. As a quick demonstration, let&#39;s define a type <code>A</code> with a field <code>x</code>, and then make instances of <code>A</code> be the function <code>x&#43;y</code>:</p>


<pre class='hljl'>
<span class='hljl-k'>struct</span><span class='hljl-t'> </span><span class='hljl-n'>A</span><span class='hljl-t'>
  </span><span class='hljl-n'>x</span><span class='hljl-t'>
</span><span class='hljl-k'>end</span><span class='hljl-t'>

</span><span class='hljl-p'>(</span><span class='hljl-n'>a</span><span class='hljl-oB'>::</span><span class='hljl-n'>A</span><span class='hljl-p'>)(</span><span class='hljl-n'>y</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>a</span><span class='hljl-oB'>.</span><span class='hljl-n'>x</span><span class='hljl-oB'>+</span><span class='hljl-n'>y</span><span class='hljl-t'>
</span><span class='hljl-n'>a</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>A</span><span class='hljl-p'>(</span><span class='hljl-ni'>2</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>a</span><span class='hljl-p'>(</span><span class='hljl-ni'>3</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
5
</pre>


<p>If you&#39;re familiar with object-oriented programming, this is similar to using an object in a way that references the <code>self</code>, though it&#39;s a bit more general due to allowing dispatching, i.e. this can then dependent on the input types as well.</p>
<p>So let&#39;s look at that <code>Dense</code> call with this in mind:</p>



<pre class='hljl'>
<span class='hljl-k'>function</span><span class='hljl-t'> </span><span class='hljl-p'>(</span><span class='hljl-n'>a</span><span class='hljl-oB'>::</span><span class='hljl-n'>Dense</span><span class='hljl-p'>)(</span><span class='hljl-n'>x</span><span class='hljl-oB'>::</span><span class='hljl-n'>AbstractArray</span><span class='hljl-p'>)</span><span class='hljl-t'>
  </span><span class='hljl-n'>W</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>b</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>σ</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>a</span><span class='hljl-oB'>.</span><span class='hljl-n'>W</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>a</span><span class='hljl-oB'>.</span><span class='hljl-n'>b</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>a</span><span class='hljl-oB'>.</span><span class='hljl-n'>σ</span><span class='hljl-t'>
  </span><span class='hljl-n'>σ</span><span class='hljl-oB'>.</span><span class='hljl-p'>(</span><span class='hljl-n'>W</span><span class='hljl-oB'>*</span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>.+</span><span class='hljl-t'> </span><span class='hljl-n'>b</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-k'>end</span>
</pre>


<p>This means that <code>Dense</code> is a function that takes in an <code>x</code> and computes <code>σ.&#40;W*x.&#43;b&#41;</code>, which is precisely how we defined the layer before&#33; To see that this is just a function, let&#39;s call it directly:</p>


<pre class='hljl'>
<span class='hljl-n'>f</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-n'>tanh</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>f</span><span class='hljl-p'>(</span><span class='hljl-nf'>rand</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>))</span>
</pre>


<pre class="output">
32-element Array&#123;Float32,1&#125;:
  0.76604545
 -0.5101571
 -0.7378224
 -0.20090216
  0.61065334
 -0.6672427
  0.10194461
 -0.09060735
 -0.48079655
 -0.8216238
  ⋮
  0.3860857
  0.44599465
  0.15579815
 -0.08692834
  0.26560986
  0.29243132
 -0.021147825
 -0.41147602
  0.21349494
</pre>


<p>So okay, <code>Dense</code> objects are just functions that have weight and bias matrices inside of them. Now what does <code>Chain</code> do?</p>


<pre class='hljl'>
<span class='hljl-nd'>@which</span><span class='hljl-t'> </span><span class='hljl-nf'>Chain</span><span class='hljl-p'>(</span><span class='hljl-ni'>1</span><span class='hljl-p'>,</span><span class='hljl-ni'>2</span><span class='hljl-p'>,</span><span class='hljl-ni'>3</span><span class='hljl-p'>)</span>
</pre>



(::<b>Type{Flux.Chain}</b>)(xs...) in Flux at <a href="file://C:/Users/accou/.julia/packages/Flux/05b38/src/layers/basic.jl" target="_blank">C:\Users\accou\.julia\packages\Flux\05b38\src\layers\basic.jl:27</a>

<p>gives us:</p>



<pre class='hljl'>
<span class='hljl-k'>struct</span><span class='hljl-t'> </span><span class='hljl-nf'>Chain</span><span class='hljl-p'>{</span><span class='hljl-n'>T</span><span class='hljl-oB'>&lt;:</span><span class='hljl-n'>Tuple</span><span class='hljl-p'>}</span><span class='hljl-t'>
  </span><span class='hljl-n'>layers</span><span class='hljl-oB'>::</span><span class='hljl-n'>T</span><span class='hljl-t'>
  </span><span class='hljl-nf'>Chain</span><span class='hljl-p'>(</span><span class='hljl-n'>xs</span><span class='hljl-oB'>...</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>new</span><span class='hljl-p'>{</span><span class='hljl-nf'>typeof</span><span class='hljl-p'>(</span><span class='hljl-n'>xs</span><span class='hljl-p'>)}(</span><span class='hljl-n'>xs</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-k'>end</span><span class='hljl-t'>

</span><span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-oB'>::</span><span class='hljl-nf'>Tuple</span><span class='hljl-p'>{},</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-t'>
</span><span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-n'>fs</span><span class='hljl-oB'>::</span><span class='hljl-n'>Tuple</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-nf'>tail</span><span class='hljl-p'>(</span><span class='hljl-n'>fs</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-nf'>first</span><span class='hljl-p'>(</span><span class='hljl-n'>fs</span><span class='hljl-p'>)(</span><span class='hljl-n'>x</span><span class='hljl-p'>))</span><span class='hljl-t'>

</span><span class='hljl-p'>(</span><span class='hljl-n'>c</span><span class='hljl-oB'>::</span><span class='hljl-n'>Chain</span><span class='hljl-p'>)(</span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-n'>c</span><span class='hljl-oB'>.</span><span class='hljl-n'>layers</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-p'>)</span>
</pre>


<p>Let&#39;s now dig into this. The <code>...</code> is known that the <strong>slurp operator</strong>, which allows for &quot;slurping up&quot; multiple arguments into a single object <code>xs</code>. For example:</p>


<pre class='hljl'>
<span class='hljl-nf'>slurper</span><span class='hljl-p'>(</span><span class='hljl-n'>xs</span><span class='hljl-oB'>...</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nd'>@show</span><span class='hljl-t'> </span><span class='hljl-n'>xs</span><span class='hljl-t'>
</span><span class='hljl-nf'>slurper</span><span class='hljl-p'>(</span><span class='hljl-ni'>1</span><span class='hljl-p'>,</span><span class='hljl-ni'>2</span><span class='hljl-p'>,</span><span class='hljl-ni'>3</span><span class='hljl-p'>,</span><span class='hljl-ni'>4</span><span class='hljl-p'>,</span><span class='hljl-ni'>5</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
xs &#61; &#40;1, 2, 3, 4, 5&#41;
&#40;1, 2, 3, 4, 5&#41;
</pre>


<p>We see that slurps the inputs up into a <code>Tuple</code>, which is an immutable data store. &#40;Note: Tuples are stack-allocated if inferred and is the internal data store of the compiler itself, and compiler inference can know exactly the size and the type of each individual object, so this does not have an overhead if fully inferred&#41;.</p>
<p>The function <code>Chain&#40;xs...&#41; &#61; new&#123;typeof&#40;xs&#41;&#125;&#40;xs&#41;</code> is an <strong>inner constructor</strong> which builds a new instance of <code>Chain</code> where <code>layers</code> is a tuple of the inputs. This means that in our case where we put a bunch of <code>Dense</code> inside of there, <code>layers</code> is a tuple of functions. What does <code>Chain</code> do? Let&#39;s look at its call:</p>



<pre class='hljl'>
<span class='hljl-p'>(</span><span class='hljl-n'>c</span><span class='hljl-oB'>::</span><span class='hljl-n'>Chain</span><span class='hljl-p'>)(</span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-n'>c</span><span class='hljl-oB'>.</span><span class='hljl-n'>layers</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-p'>)</span>
</pre>


<p>This takes the tuple of functions and then does <code>applychain</code> on it.</p>



<pre class='hljl'>
<span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-oB'>::</span><span class='hljl-nf'>Tuple</span><span class='hljl-p'>{},</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-t'>
</span><span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-n'>fs</span><span class='hljl-oB'>::</span><span class='hljl-n'>Tuple</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-nf'>tail</span><span class='hljl-p'>(</span><span class='hljl-n'>fs</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-nf'>first</span><span class='hljl-p'>(</span><span class='hljl-n'>fs</span><span class='hljl-p'>)(</span><span class='hljl-n'>x</span><span class='hljl-p'>))</span>
</pre>


<p><code>applychain</code> is a recursive function which applies the first element of the tuple onto <code>x</code>, then it calls <code>applychain</code> to call the second function onto <code>x</code>, repeatedly until there are no more functions in which case it returns <code>x</code>. This means that <code>applychain</code> is simply doing <code>h&#40;g&#40;f&#40;x&#41;&#41;&#41;</code> on the tuple of functions <code>&#40;f,g,h&#41;</code>&#33; We can thus see that this library function is exactly equivalent to the neural network we defined by hand, just put together in a different form to give a nice user interface.</p>
<h4>Detail: Recursion?</h4>
<p>But there&#39;s one more detail... why recursion? If you define a function, look at its type:</p>


<pre class='hljl'>
<span class='hljl-nf'>ff</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-p'>,</span><span class='hljl-n'>y</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-ni'>2</span><span class='hljl-n'>x</span><span class='hljl-oB'>+</span><span class='hljl-n'>y</span><span class='hljl-t'>
</span><span class='hljl-nf'>typeof</span><span class='hljl-p'>(</span><span class='hljl-n'>ff</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
typeof&#40;Main.##WeaveSandBox#253.ff&#41;
</pre>


<p>Notice that its type is simply <code>typeof&#40;ff&#41;</code> which is unique to the function, i.e. every single function is its own struct. In fact, given what we just learned, it wouldn&#39;t be a surprise to learn that is exactly what a function is in Julia&#33; A function definition lowers at the parser level to something like:</p>



<pre class='hljl'>
<span class='hljl-k'>struct</span><span class='hljl-t'> </span><span class='hljl-n'>ff2</span><span class='hljl-t'> </span><span class='hljl-oB'>&lt;:</span><span class='hljl-t'> </span><span class='hljl-n'>Function</span><span class='hljl-t'> </span><span class='hljl-k'>end</span><span class='hljl-t'>
</span><span class='hljl-p'>(</span><span class='hljl-n'>_</span><span class='hljl-oB'>::</span><span class='hljl-n'>ff2</span><span class='hljl-p'>)(</span><span class='hljl-n'>x</span><span class='hljl-p'>,</span><span class='hljl-n'>y</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-ni'>2</span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>+</span><span class='hljl-t'> </span><span class='hljl-n'>y</span><span class='hljl-t'>
</span><span class='hljl-kd'>const</span><span class='hljl-t'> </span><span class='hljl-n'>ff</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>ff2</span><span class='hljl-p'>()</span>
</pre>


<p>This means that the primitive operation here that everything really comes down to is calls on structs. Why is this done with unique <strong>singleton</strong> types? &#40;Singleton types are types where every instance is equivalent&#41;. Well, if we want the compiler to be able to optimize with respect to which function we are handling inside of another function, then we need &quot;what function we are dealing with&quot; as compile-time information, which necessitates being type information.</p>
<p>Tuples are contravariant and heterogeneously typed with a parameter per internal object. For example:</p>


<pre class='hljl'>
<span class='hljl-n'>tup</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-p'>(</span><span class='hljl-nfB'>1.0</span><span class='hljl-p'>,</span><span class='hljl-ni'>1</span><span class='hljl-p'>,</span><span class='hljl-s'>&quot;1&quot;</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>typeof</span><span class='hljl-p'>(</span><span class='hljl-n'>tup</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
Tuple&#123;Float64,Int64,String&#125;
</pre>


<p>This means that it is possible to infer outputs of a tuple even if it&#39;s heterogeneously typed by making good use of constant literals. For example, the expression <code>tup&#91;1&#93;</code> will be inferred to have the output <code>Float64</code>. However, note that if <code>i</code> is not a compile-time constant, then <code>tup&#91;i&#93;</code> cannot be inferred since, given what the compiler knows, the output could be either a <code>Float64</code>, an <code>Int64</code>, or a <code>String</code>.</p>
<p>So now let&#39;s think back to our tuple of functions. By what we described before, <code>tup &#61; &#40;f,g,h&#41;</code> is going to have a different type for each of the functions and thus could not specialize on the inputs if we used <code>tup&#91;i&#93;</code>. Therefore:</p>



<pre class='hljl'>
<span class='hljl-k'>for</span><span class='hljl-t'> </span><span class='hljl-n'>i</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-t'> </span><span class='hljl-ni'>1</span><span class='hljl-oB'>:</span><span class='hljl-nf'>length</span><span class='hljl-p'>(</span><span class='hljl-n'>tup</span><span class='hljl-p'>)</span><span class='hljl-t'>
  </span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>tup</span><span class='hljl-p'>[</span><span class='hljl-n'>i</span><span class='hljl-p'>](</span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-k'>end</span>
</pre>


<p>would be slow &#40;if the function call cost is small compared to the dispatch cost of about 100ns. This is not always the case, but should be considered in many instances&#33;&#41;. So how can you get around it? Well, if everything was constant literals then this would specialize:</p>



<pre class='hljl'>
<span class='hljl-n'>tup</span><span class='hljl-p'>[</span><span class='hljl-ni'>3</span><span class='hljl-p'>](</span><span class='hljl-n'>tup</span><span class='hljl-p'>[</span><span class='hljl-ni'>2</span><span class='hljl-p'>](</span><span class='hljl-n'>tup</span><span class='hljl-p'>[</span><span class='hljl-ni'>1</span><span class='hljl-p'>](</span><span class='hljl-n'>x</span><span class='hljl-p'>)))</span>
</pre>


<p>would fully specialize and infer, and the compiler would have full knowledge of the entire call chain as if it were written out as straightline code. Now if we look at the recursion again:</p>



<pre class='hljl'>
<span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-oB'>::</span><span class='hljl-nf'>Tuple</span><span class='hljl-p'>{},</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-t'>
</span><span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-n'>fs</span><span class='hljl-oB'>::</span><span class='hljl-n'>Tuple</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>applychain</span><span class='hljl-p'>(</span><span class='hljl-nf'>tail</span><span class='hljl-p'>(</span><span class='hljl-n'>fs</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-nf'>first</span><span class='hljl-p'>(</span><span class='hljl-n'>fs</span><span class='hljl-p'>)(</span><span class='hljl-n'>x</span><span class='hljl-p'>))</span>
</pre>


<p>we see that, at compile-time, we know that <code>typeof&#40;&#40;f,g,h&#41;&#41; &#61; Tuple&#123;typeof&#40;f&#41;,typeof&#40;g&#41;,typeof&#40;h&#41;&#125;</code>, and so we know that the first <code>first&#40;fs&#41;</code> will be <code>f</code>, and can specialize on this. We know then that <code>tail&#40;fs&#41;</code> has to be the <code>&#40;g,h&#41;</code> and so then we recurse and know that <code>g</code> is first and ... This means that this scheme is equivalent to have written out <code>xs&#91;3&#93;&#40;xs&#91;2&#93;&#40;xs&#91;1&#93;&#40;x&#41;&#41;&#41;</code> and is thus generating code perfectly specialized to the order and amount of functions we had put into the <code>Chain</code>. This kind of abstraction, an abstraction where all of the overhead compiles away, is known as a <strong>zero-cost abstraction</strong>.</p>
<p>&#40;Note that technically, there is a cost since the compiler has to unravel this chain of events.&#41;</p>
<h3>Training Neural Networks</h3>
<p>Now let&#39;s get into training neural networks. &quot;Training&quot; a neural network is simply the process of finding weights that minimize a loss function. For example, let&#39;s say we wanted to make our neural network be the constant function <code>1</code> for any input <span class="math">$x \in [0,1]^10$</span>. We can then write the loss function:</p>


<pre class='hljl'>
<span class='hljl-n'>NN</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>Chain</span><span class='hljl-p'>(</span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>10</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-n'>tanh</span><span class='hljl-p'>),</span><span class='hljl-t'>
           </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-n'>tanh</span><span class='hljl-p'>),</span><span class='hljl-t'>
           </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>5</span><span class='hljl-p'>))</span><span class='hljl-t'>
</span><span class='hljl-nf'>loss</span><span class='hljl-p'>()</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>sum</span><span class='hljl-p'>(</span><span class='hljl-n'>abs2</span><span class='hljl-p'>,</span><span class='hljl-nf'>sum</span><span class='hljl-p'>(</span><span class='hljl-n'>abs2</span><span class='hljl-p'>,</span><span class='hljl-nf'>NN</span><span class='hljl-p'>(</span><span class='hljl-nf'>rand</span><span class='hljl-p'>(</span><span class='hljl-ni'>10</span><span class='hljl-p'>))</span><span class='hljl-oB'>.-</span><span class='hljl-ni'>1</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-k'>for</span><span class='hljl-t'> </span><span class='hljl-n'>i</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-t'> </span><span class='hljl-ni'>1</span><span class='hljl-oB'>:</span><span class='hljl-ni'>100</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>loss</span><span class='hljl-p'>()</span>
</pre>


<pre class="output">
5118.585f0
</pre>


<p>This loss function takes 100 random points in <code>&#91;0,1&#93;</code> and then computes the output of the neural network minus <code>1</code> on each of the values, and sums up the squared values &#40;<code>abs2</code>&#41;. Why the squared values? This means that every computed loss value is positive, and so we know that by decreasing the loss this means that, on average our neural network outputs are closer to 1. What are the weights? Since we&#39;re using the Flux callable struct style from above, the weights are those inside of the <code>NN</code> chain object, which we can inspect:</p>


<pre class='hljl'>
<span class='hljl-n'>NN</span><span class='hljl-p'>[</span><span class='hljl-ni'>1</span><span class='hljl-p'>]</span><span class='hljl-oB'>.</span><span class='hljl-n'>W</span><span class='hljl-t'> </span><span class='hljl-cs'># The W matrix of the first layer</span>
</pre>


<pre class="output">
32×10 Array&#123;Float32,2&#125;:
  0.178179   -0.0572642  -0.0894479   0.1665      …   0.261059    -0.246197
 -0.344585   -0.169858    0.295461    0.199319        0.0831651    0.022302
8
  0.257104   -0.235654   -0.284184    0.117784       -0.17809      0.336285
 -0.360832   -0.278086    0.29686     0.246984        0.249758     0.003347
55
 -0.191965    0.291312   -0.362307    0.191132       -0.0761521    0.081189
6
 -0.367509   -0.307449    0.152211    0.225501    …   0.00290446  -0.311886
 -0.0870824  -0.093548    0.108141    0.340857        0.0783499   -0.117483
 -0.0741076   0.306695    0.178488   -0.0373634       0.33313      0.105541
  0.0179611   0.127147    0.0851364  -0.112369        0.148244    -0.357636
 -0.315881   -0.127374   -0.154361   -0.349332       -0.00241225   0.009149
07
  ⋮                                               ⋱               
 -0.161968    0.181037    0.362255    0.203591       -0.252297     0.161839
 -0.364117    0.341458   -0.0292531  -0.00389994     -0.113241     0.137249
 -0.0666277   0.133553    0.0432152   0.113794    …   0.0171254   -0.180427
  0.050103   -0.161901    0.369362   -0.252003       -0.25834      0.252459
 -0.301285    0.0529177  -0.318422   -0.0616636      -0.265852    -0.071839
8
 -0.200076   -0.280823    0.14862     0.122399        0.0350758   -0.076947
3
  0.184961   -0.185431   -0.083517   -0.064948       -0.309934    -0.012245
2
 -0.0559716  -0.231541   -0.334746    0.217256    …  -0.0782027    0.177771
  0.0102533  -0.341847   -0.248525   -0.253197       -0.023493     0.29358
</pre>


<p>Now let&#39;s grab all of the parameters together:</p>


<pre class='hljl'>
<span class='hljl-n'>p</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>params</span><span class='hljl-p'>(</span><span class='hljl-n'>NN</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
Params&#40;&#91;Float32&#91;0.17817877 -0.057264224 … 0.2610589 -0.24619716; -0.3445848
 -0.1698582 … 0.08316508 0.022302793; … ; -0.055971634 -0.23154116 … -0.078
202695 0.17777056; 0.010253323 -0.34184742 … -0.023493016 0.29358014&#93;, Floa
t32&#91;0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0
, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0&#93;, Float32&#91;-0.18483095 0.09198185 … -0.148875
45 -0.18770082; 0.15788232 -0.09012224 … 0.24015284 0.07276287; … ; -0.1981
4347 -0.047513098 … -0.19939171 -0.24471624; -0.042551473 -0.012920137 … 0.
07584524 -0.24311528&#93;, Float32&#91;0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0&#93;, Float32&#91;-0.3592
1615 -0.039866373 … -0.048012502 0.40115017; 0.26722172 -0.17609675 … 0.342
3274 -0.09435972; … ; 0.26929897 -0.19022429 … 0.33936748 -0.34868398; 0.27
01673 0.19228207 … 0.047233578 0.32165092&#93;, Float32&#91;0.0, 0.0, 0.0, 0.0, 0.0
&#93;&#93;&#41;
</pre>


<p>That&#39;s a helper function on <code>Chain</code> which recursively gathers all of the defining parameters. Let&#39;s not find the optimal values <code>p</code> which cause the neural network to be the constant <code>1</code> function:</p>


<pre class='hljl'>
<span class='hljl-n'>Flux</span><span class='hljl-oB'>.</span><span class='hljl-nf'>train!</span><span class='hljl-p'>(</span><span class='hljl-n'>loss</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>p</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>Iterators</span><span class='hljl-oB'>.</span><span class='hljl-nf'>repeated</span><span class='hljl-p'>((),</span><span class='hljl-t'> </span><span class='hljl-ni'>10000</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-nf'>ADAM</span><span class='hljl-p'>(</span><span class='hljl-nfB'>0.1</span><span class='hljl-p'>))</span>
</pre>



<p>Now let&#39;s check the loss:</p>


<pre class='hljl'>
<span class='hljl-nf'>loss</span><span class='hljl-p'>()</span>
</pre>


<pre class="output">
1.6048971f-6
</pre>


<p>This means that <code>NN&#40;x&#41;</code> is now a very good function approximator to <code>f&#40;x&#41; &#61; ones&#40;5&#41;</code>&#33;</p>
<h3>So Why Machine Learning? Why Neural Networks?</h3>
<p>All we did was find parameters that made <code>NN&#40;x&#41;</code> act like a function <code>f&#40;x&#41;</code>. How does that relate to machine learning? Well, in any case where one is acting on data <code>&#40;x,y&#41;</code>, the idea is to assume that there exists some underlying mathematical model <code>f&#40;x&#41; &#61; y</code>. If we had perfect knowledge of what <code>f</code> is, then from only the information of <code>x</code> we can then predict what <code>y</code> would be. The inference problem is to then figure out what function <code>f</code> should be. Therefore, machine learning on data is simply this problem of finding an approximator to some unknown function&#33;</p>
<p>So why neural networks? Neural networks satisfy two properties. The first of which is known as the Universal Approximation Theorem &#40;UAT&#41;, which in simple non-mathematical language means that, for any ϵ of accuracy, if your neural network is large enough &#40;has enough layers, the weight matrices are large enough&#41;, then it can approximate <strong>any</strong> &#40;nice&#41; function <code>f</code> within that ϵ. Therefore, we can reduce the problem of finding missing functions, the problem of machine learning, to a problem of finding a finding the weights of neural networks, which is a well-defined mathematical optimization problem.</p>
<p>Why neural networks specifically? That&#39;s a fairly good question, since there are many other functions with this property. For example, you will have learned from analysis that <span class="math">$a_0 + a_1 x + a_2 x^2 + \ldots$</span> arbitrary polynomials can be used to approximate any analytic function &#40;this is the Taylor series&#41;. Similarly, a Fourier series</p>
<p class="math">\[
f(x) = a_0 + \sum_k b_k \cos(kx) + c_k \sin(kx)
\]</p>
<p>can approximate any continuous function <code>f</code> &#40;and discontinuous functions also can have convergence, etc. these are the details of a harmonic analysis course&#41;.</p>
<p>That&#39;s all for one dimension. How about two dimensional functions? It turns out it&#39;s not difficult to prove that tensor products of universal approximators will give higher dimensional universal approximators. So for example, tensoring together two polynomials:</p>
<p class="math">\[
a_0 + a_1 x + a_2 y + a_3 x y + a_4 x^2 y + a_5 x y^2 + a_6 x^2 y^2 + \ldots
\]</p>
<p>will give a two-dimensional function approximator. But notice how we have to resolve every combination of terms. This means that if we used <code>n</code> coefficients in each dimension <code>d</code>, the total number of coefficients to build a <code>d</code>-dimensional universal approximator from one-dimensional objects would need <span class="math">$n^d$</span> coefficients. This exponential growth is known as <strong>the curse of dimensionality</strong>.</p>
<p>The second property of neural networks that makes them applicable to machine learning is that they overcome the curse of dimensionality. The proofs in this area <a href="https://arxiv.org/abs/1908.10828">can be a little difficult to parse</a>, but what they boil down to is proving in many cases that the growth of neural networks to sufficiently approximate a <code>d</code>-dimensional function grows as a polynomial of <code>d</code>, rather than exponential. This means that there&#39;s some dimensional cutoff where for <span class="math">$d>cutoff$</span> it is more efficient to use a neural network. This can be problem-specific, but generally it tends to be the case at least by 8 or 10 dimensions.</p>
<p>Neural networks have a few other properties to consider as well:</p>
<ol>
<li><p>The assumptions of the neural network can be encoded into the neural architectures. A neural network where the last layer has an activation function <code>x-&gt;x^2</code> is a neural network where all outputs are positive. This means that if you want to find a positive function, you can make the optimization easier by enforcing this constraint. A lot of other constraints can be enforced, like <code>tanh</code> activation functions can make the neural network be a smooth &#40;all derivatives finite&#41; function, or other activations can cause finite numbers of learnable discontinuities.</p>
</li>
<li><p>Generating higher dimensional forms from one dimensional forms does not have good symmetry. For example, the two-dimensional tensor Fourier basis does not have a good way to represent <span class="math">$sin(xy)$</span>. This property of the approximator is called &#40;non&#41;isotropy and more detail can be found in <a href="https://www.youtube.com/watch?v&#61;JngdaWe3-gg">this wonderful talk about function approximation for multidimensional integration &#40;cubature&#41;</a>. Neural networks are naturally not aligned to a basis.</p>
</li>
<li><p>Neural networks are &quot;easy&quot; to compute. There&#39;s good software for them, GPU-acceleration, and all other kinds of tooling that make them particularly simple to use.</p>
</li>
<li><p>There are proofs that in many scenarios for neural networks <a href="https://arxiv.org/abs/2006.05900">the local minima are the global minima</a>, meaning that local optimization is sufficient for training a neural network. Global optimization &#40;which we will cover later in the course&#41; is much more expensive than local methods like gradient descent, and thus this can be a good property to abuse for faster computation.</p>
</li>
</ol>
<h3>From Machine Learning to Scientific Machine Learning: Structure and Science</h3>
<p>This understanding of a neural network and their libraries directly bridges to the understanding of scientific machine learning and the computation done in the field. In scientific machine learning, neural networks and machine learning are used as the basis to solve problems in scientific computing. <a href="https://en.wikipedia.org/wiki/Computational_science">Scientific computing, as a discipline also known as Computational Science, is a field of study which focuses on scientific simulation, using tools such as differential equations to investigate physical, biological, and other phonomena</a>.</p>
<p>What we wish to do in scientific machine learning is use these properties of neural networks to improve the way that we investigate our scientific models.</p>
<h4>Aside: Why Differential Equations?</h4>
<p>Why do differential equations come up so often in as the model in the scientific context? This is a deep question with quite a simple answer. Essentially, all scientific experiments always have to test how things change. For example, you take a system now, you change it, and your measurement is how the changes you made caused changes in the system. This boils down to gather information about how, for some arbitrary system <span class="math">$y = f(x)$</span>, how <span class="math">$\Delta x$</span> is related to <span class="math">$\Delta y$</span>. Thus what you learn from scientific experiments, what is codified as scientific laws, is not &quot;the answer&quot;, but the answer to how things change. This process of writing down equations by describing how they change precisely gives differential equations.</p>
<h2>Solving ODEs with Neural Networks: The Physics-Informed Neural Network</h2>
<p>Now let&#39;s get to our first true SciML application: solving ordinary differential equations with neural networks. The process of solving a differential equation with a neural network, or using a differential equation as a regularizer in the loss function, is known as a <strong>physics-informed neural network</strong>, since this allows for physical equations to guide the training of the neural network in circumstances where data might be lacking.</p>
<h3>Background: A Method for Solving Ordinary Differential Equations with Neural Networks</h3>
<p><a href="https://arxiv.org/pdf/physics/9705023.pdf">This is a result first due to Lagaris et. al from 1998</a>. The idea is to solve differential equations using neural networks by representing the solution by a neural network and training the resulting network to satisfy the conditions required by the differential equation.</p>
<p>Let&#39;s say we want to solve a system of ordinary differential equations</p>
<p class="math">\[
u' = f(u,t)
\]</p>
<p>with <span class="math">$t \in [0,1]$</span> and a known initial condition <span class="math">$u(0)=u_0$</span>. To solve this, we approximate the solution by a neural network:</p>
<p class="math">\[
NN(t) \approx u(t)
\]</p>
<p>If <span class="math">$NN(t)$</span> was the true solution, then it would hold that <span class="math">$NN'(t) = f(NN(t),t)$</span> for all <span class="math">$t$</span>. Thus we turn this condition into our loss function. This motivates the loss function:</p>
<p class="math">\[
L(p) = \sum_i \left(\frac{dNN(t_i)}{dt} - f(NN(t_i),t_i) \right)^2
\]</p>
<p>The choice of <span class="math">$t_i$</span> could be done in many ways: it can be random, it can be a grid, etc. Anyways, when this loss function is minimized &#40;gradients computed with standard reverse-mode automatic differentiation&#41;, then we have that <span class="math">$\frac{dNN(t_i)}{dt} \approx f(NN(t_i),t_i)$</span> and thus <span class="math">$NN(t)$</span> approximately solves the differential equation.</p>
<p>Note that we still have to handle the initial condition. One simple way to do this is to add an initial condition term to the cost function. This would look like:</p>
<p class="math">\[
L(p) = (NN(0) - u_0)^2 + \sum_i \left(\frac{dNN(t_i)}{dt} - f(NN(t_i),t_i) \right)^2
\]</p>
<p>While that would work, it can be more efficient to encode the initial condition into the function itself so that it&#39;s trivially satisfied for any possible set of parameters. For example, instead of directly using a neural network, we can use:</p>
<p class="math">\[
g(t) = u_0 + tNN(t)
\]</p>
<p>as our solution. Notice that <span class="math">$g(t)$</span> is thus a universal approximator for all continuous functions such that <span class="math">$g(0)=0$</span> &#40;this is a property one should prove&#33;&#41;. Since <span class="math">$g(t)$</span> will always satisfy the initial condition, we can train <span class="math">$g(t)$</span> to satisfy the derivative function then it will automatically be a solution to the derivative function. In this sense, we can use the loss function:</p>
<p class="math">\[
L(p) = \sum_i \left(\frac{dg(t_i)}{dt} - f(g(t_i),t_i) \right)^2
\]</p>
<p>where <span class="math">$p$</span> are the parameters that define <span class="math">$g$</span>, which in turn are the parameters which define the neural network <span class="math">$NN$</span> that define <span class="math">$g$</span>. Thus this reduces down, once again, to simply finding weights which minimize a loss function&#33;</p>
<h3>Coding Up the Method</h3>
<p>Now let&#39;s implement this method with Flux. Let&#39;s define a neural network to be the <code>NN&#40;t&#41;</code> above. To make the problem easier, let&#39;s look at the ODE:</p>
<p class="math">\[
u' = \cos 2\pi t
\]</p>
<p>and approximate it with the neural network from a scalar to a scalar:</p>


<pre class='hljl'>
<span class='hljl-k'>using</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span><span class='hljl-t'>
</span><span class='hljl-n'>NNODE</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>Chain</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>-&gt;</span><span class='hljl-t'> </span><span class='hljl-p'>[</span><span class='hljl-n'>x</span><span class='hljl-p'>],</span><span class='hljl-t'> </span><span class='hljl-cs'># Take in a scalar and transform it into an array</span><span class='hljl-t'>
           </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>1</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-n'>tanh</span><span class='hljl-p'>),</span><span class='hljl-t'>
           </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>1</span><span class='hljl-p'>),</span><span class='hljl-t'>
           </span><span class='hljl-n'>first</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-cs'># Take first value, i.e. return a scalar</span><span class='hljl-t'>
</span><span class='hljl-nf'>NNODE</span><span class='hljl-p'>(</span><span class='hljl-nfB'>1.0</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
0.0905525f0
</pre>


<p>Instead of directly approximating the neural network, we will use the transformed equation that is forced to satisfy the boundary conditions. Using <code>u0&#61;1.0</code>, we have the function:</p>


<pre class='hljl'>
<span class='hljl-nf'>g</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>t</span><span class='hljl-oB'>*</span><span class='hljl-nf'>NNODE</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>+</span><span class='hljl-t'> </span><span class='hljl-nfB'>1f0</span>
</pre>


<pre class="output">
g &#40;generic function with 1 method&#41;
</pre>


<p>as our universal approximator. Thus, for this to be a function that satisfies</p>
<p class="math">\[
g' = \cos 2\pi t
\]</p>
<p>we would need that:</p>


<pre class='hljl'>
<span class='hljl-k'>using</span><span class='hljl-t'> </span><span class='hljl-n'>Statistics</span><span class='hljl-t'>
</span><span class='hljl-n'>ϵ</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>sqrt</span><span class='hljl-p'>(</span><span class='hljl-nf'>eps</span><span class='hljl-p'>(</span><span class='hljl-n'>Float32</span><span class='hljl-p'>))</span><span class='hljl-t'>
</span><span class='hljl-nf'>loss</span><span class='hljl-p'>()</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>mean</span><span class='hljl-p'>(</span><span class='hljl-nf'>abs2</span><span class='hljl-p'>(((</span><span class='hljl-nf'>g</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-oB'>+</span><span class='hljl-n'>ϵ</span><span class='hljl-p'>)</span><span class='hljl-oB'>-</span><span class='hljl-nf'>g</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>))</span><span class='hljl-oB'>/</span><span class='hljl-n'>ϵ</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>-</span><span class='hljl-t'> </span><span class='hljl-nf'>cos</span><span class='hljl-p'>(</span><span class='hljl-ni'>2</span><span class='hljl-n'>π</span><span class='hljl-oB'>*</span><span class='hljl-n'>t</span><span class='hljl-p'>))</span><span class='hljl-t'> </span><span class='hljl-k'>for</span><span class='hljl-t'> </span><span class='hljl-n'>t</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-t'> </span><span class='hljl-ni'>0</span><span class='hljl-oB'>:</span><span class='hljl-nfB'>1f-2</span><span class='hljl-oB'>:</span><span class='hljl-nfB'>1f0</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
loss &#40;generic function with 1 method&#41;
</pre>


<p>would be minimized.</p>


<pre class='hljl'>
<span class='hljl-n'>opt</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span><span class='hljl-oB'>.</span><span class='hljl-nf'>Descent</span><span class='hljl-p'>(</span><span class='hljl-nfB'>0.01</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>data</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>Iterators</span><span class='hljl-oB'>.</span><span class='hljl-nf'>repeated</span><span class='hljl-p'>((),</span><span class='hljl-t'> </span><span class='hljl-ni'>5000</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>iter</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-ni'>0</span><span class='hljl-t'>
</span><span class='hljl-n'>cb</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-k'>function</span><span class='hljl-t'> </span><span class='hljl-p'>()</span><span class='hljl-t'> </span><span class='hljl-cs'>#callback function to observe training</span><span class='hljl-t'>
  </span><span class='hljl-kd'>global</span><span class='hljl-t'> </span><span class='hljl-n'>iter</span><span class='hljl-t'> </span><span class='hljl-oB'>+=</span><span class='hljl-t'> </span><span class='hljl-ni'>1</span><span class='hljl-t'>
  </span><span class='hljl-k'>if</span><span class='hljl-t'> </span><span class='hljl-n'>iter</span><span class='hljl-t'> </span><span class='hljl-oB'>%</span><span class='hljl-t'> </span><span class='hljl-ni'>500</span><span class='hljl-t'> </span><span class='hljl-oB'>==</span><span class='hljl-t'> </span><span class='hljl-ni'>0</span><span class='hljl-t'>
    </span><span class='hljl-nf'>display</span><span class='hljl-p'>(</span><span class='hljl-nf'>loss</span><span class='hljl-p'>())</span><span class='hljl-t'>
  </span><span class='hljl-k'>end</span><span class='hljl-t'>
</span><span class='hljl-k'>end</span><span class='hljl-t'>
</span><span class='hljl-nf'>display</span><span class='hljl-p'>(</span><span class='hljl-nf'>loss</span><span class='hljl-p'>())</span><span class='hljl-t'>
</span><span class='hljl-n'>Flux</span><span class='hljl-oB'>.</span><span class='hljl-nf'>train!</span><span class='hljl-p'>(</span><span class='hljl-n'>loss</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span><span class='hljl-oB'>.</span><span class='hljl-nf'>params</span><span class='hljl-p'>(</span><span class='hljl-n'>NNODE</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-n'>data</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>opt</span><span class='hljl-p'>;</span><span class='hljl-t'> </span><span class='hljl-n'>cb</span><span class='hljl-oB'>=</span><span class='hljl-n'>cb</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
0.5139041026135249
0.4884001675454146
0.4493133325782942
0.3159551477493703
0.08191695378003047
0.012525667537545464
0.006295769127973989
0.005406328158867731
0.005111489567330365
0.004906525248550312
0.00471777023624818
</pre>


<p>How well did this do? Well if we take the integral of both sides of our differential equation, we see it&#39;s fairly trivial:</p>
<p class="math">\[
\int g' = g = \int \cos 2\pi t = C + \frac{\sin 2\pi t}{2\pi}
\]</p>
<p>where we defined <span class="math">$C = 1$</span>. Let&#39;s take a bunch of &#40;input,output&#41; pairs from the neural network and plot it against the analytical solution to the differential equation:</p>


<pre class='hljl'>
<span class='hljl-k'>using</span><span class='hljl-t'> </span><span class='hljl-n'>Plots</span><span class='hljl-t'>
</span><span class='hljl-n'>t</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-ni'>0</span><span class='hljl-oB'>:</span><span class='hljl-nfB'>0.001</span><span class='hljl-oB'>:</span><span class='hljl-nfB'>1.0</span><span class='hljl-t'>
</span><span class='hljl-nf'>plot</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>,</span><span class='hljl-n'>g</span><span class='hljl-oB'>.</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>),</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;NN&quot;</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>plot!</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>,</span><span class='hljl-nfB'>1.0</span><span class='hljl-t'> </span><span class='hljl-oB'>.+</span><span class='hljl-t'> </span><span class='hljl-n'>sin</span><span class='hljl-oB'>.</span><span class='hljl-p'>(</span><span class='hljl-ni'>2</span><span class='hljl-n'>π</span><span class='hljl-oB'>.*</span><span class='hljl-n'>t</span><span class='hljl-p'>)</span><span class='hljl-oB'>/</span><span class='hljl-ni'>2</span><span class='hljl-n'>π</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>label</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-s'>&quot;True Solution&quot;</span><span class='hljl-p'>)</span>
</pre>


<img src=""  />

<p>We see that it matches very well, and we can keep improving this fit by increasing the size of the neural network, using more training points, and training for more iterations.</p>
<h3>Example: Harmonic Oscillator Informed Training</h3>
<p>Using this idea, differential equations encoding physical laws can be utilized inside of loss functions for terms which we have some basis to believe should approximately follow some physical system. Let&#39;s investigate this last step by looking at how to inform the training of a neural network using the harmonic oscillator.</p>
<p>Let&#39;s assume that we are taking measurements of &#40;position,force&#41; in some real one-dimensional spring pushing and pulling against a wall.</p>
<p><img src="https://thumbs.dreamstime.com/b/hookes-law-vector-illustration-physics-extend-spring-force-explanation-scheme-compress-mathematical-experiment-weight-177188357.jpg" alt="" /></p>
<p>But instead of the simple spring, let&#39;s assume we had a more complex spring, for example, let&#39;s say <span class="math">$F(x) = -kx + 0.1sin(x)$</span> where this extra term is due to some deformities in the medal &#40;assume mass&#61;1&#41;. Then by Newton&#39;s law of motion we have a second order ordinary differential equation:</p>
<p class="math">\[
x'' = -kx + 0.1 \sin(x)
\]</p>
<p>We can use the <a href="https://diffeq.sciml.ai/stable/">DifferentialEquations.jl package</a> to solve this differential equation and see what this system looks like:</p>


<pre class='hljl'>
<span class='hljl-k'>using</span><span class='hljl-t'> </span><span class='hljl-n'>DifferentialEquations</span><span class='hljl-t'>
</span><span class='hljl-n'>k</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nfB'>1.0</span><span class='hljl-t'>
</span><span class='hljl-nf'>force</span><span class='hljl-p'>(</span><span class='hljl-n'>dx</span><span class='hljl-p'>,</span><span class='hljl-n'>x</span><span class='hljl-p'>,</span><span class='hljl-n'>k</span><span class='hljl-p'>,</span><span class='hljl-n'>t</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-oB'>-</span><span class='hljl-n'>k</span><span class='hljl-oB'>*</span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>+</span><span class='hljl-t'> </span><span class='hljl-nfB'>0.1</span><span class='hljl-nf'>sin</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>prob</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>SecondOrderODEProblem</span><span class='hljl-p'>(</span><span class='hljl-n'>force</span><span class='hljl-p'>,</span><span class='hljl-nfB'>1.0</span><span class='hljl-p'>,</span><span class='hljl-nfB'>0.0</span><span class='hljl-p'>,(</span><span class='hljl-nfB'>0.0</span><span class='hljl-p'>,</span><span class='hljl-nfB'>10.0</span><span class='hljl-p'>),</span><span class='hljl-n'>k</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>sol</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>solve</span><span class='hljl-p'>(</span><span class='hljl-n'>prob</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>plot</span><span class='hljl-p'>(</span><span class='hljl-n'>sol</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-p'>[</span><span class='hljl-s'>&quot;Velocity&quot;</span><span class='hljl-t'> </span><span class='hljl-s'>&quot;Position&quot;</span><span class='hljl-p'>])</span>
</pre>


<img src=""  />

<p>Don&#39;t worry if you don&#39;t understand this sytnax yet: we will go over differential equation solvers and DifferentialEquations.jl in a later lecture.</p>
<p>Let&#39;s say we want to learn how to predict the force applied on the spring at each point in space, <span class="math">$F(x)$</span>. We want to learn a function, so this is the job for machine learning&#33; However, we only have 6 measurements, which includes the information about &#40;position,velocity,force&#41; at evenly spaced times:</p>


<pre class='hljl'>
<span class='hljl-n'>plot_t</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-ni'>0</span><span class='hljl-oB'>:</span><span class='hljl-nfB'>0.01</span><span class='hljl-oB'>:</span><span class='hljl-ni'>10</span><span class='hljl-t'>
</span><span class='hljl-n'>data_plot</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>sol</span><span class='hljl-p'>(</span><span class='hljl-n'>plot_t</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>positions_plot</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-p'>[</span><span class='hljl-n'>state</span><span class='hljl-p'>[</span><span class='hljl-ni'>2</span><span class='hljl-p'>]</span><span class='hljl-t'> </span><span class='hljl-k'>for</span><span class='hljl-t'> </span><span class='hljl-n'>state</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-t'> </span><span class='hljl-n'>data_plot</span><span class='hljl-p'>]</span><span class='hljl-t'>
</span><span class='hljl-n'>force_plot</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-p'>[</span><span class='hljl-nf'>force</span><span class='hljl-p'>(</span><span class='hljl-n'>state</span><span class='hljl-p'>[</span><span class='hljl-ni'>1</span><span class='hljl-p'>],</span><span class='hljl-n'>state</span><span class='hljl-p'>[</span><span class='hljl-ni'>2</span><span class='hljl-p'>],</span><span class='hljl-n'>k</span><span class='hljl-p'>,</span><span class='hljl-n'>t</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-k'>for</span><span class='hljl-t'> </span><span class='hljl-n'>state</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-t'> </span><span class='hljl-n'>data_plot</span><span class='hljl-p'>]</span><span class='hljl-t'>

</span><span class='hljl-cs'># Generate the dataset</span><span class='hljl-t'>
</span><span class='hljl-n'>t</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-ni'>0</span><span class='hljl-oB'>:</span><span class='hljl-nfB'>3.3</span><span class='hljl-oB'>:</span><span class='hljl-ni'>10</span><span class='hljl-t'>
</span><span class='hljl-n'>dataset</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>sol</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>position_data</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-p'>[</span><span class='hljl-n'>state</span><span class='hljl-p'>[</span><span class='hljl-ni'>2</span><span class='hljl-p'>]</span><span class='hljl-t'> </span><span class='hljl-k'>for</span><span class='hljl-t'> </span><span class='hljl-n'>state</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-t'> </span><span class='hljl-nf'>sol</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>)]</span><span class='hljl-t'>
</span><span class='hljl-n'>force_data</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-p'>[</span><span class='hljl-nf'>force</span><span class='hljl-p'>(</span><span class='hljl-n'>state</span><span class='hljl-p'>[</span><span class='hljl-ni'>1</span><span class='hljl-p'>],</span><span class='hljl-n'>state</span><span class='hljl-p'>[</span><span class='hljl-ni'>2</span><span class='hljl-p'>],</span><span class='hljl-n'>k</span><span class='hljl-p'>,</span><span class='hljl-n'>t</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-k'>for</span><span class='hljl-t'> </span><span class='hljl-n'>state</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-t'> </span><span class='hljl-nf'>sol</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>)]</span><span class='hljl-t'>

</span><span class='hljl-nf'>plot</span><span class='hljl-p'>(</span><span class='hljl-n'>plot_t</span><span class='hljl-p'>,</span><span class='hljl-n'>force_plot</span><span class='hljl-p'>,</span><span class='hljl-n'>xlabel</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;t&quot;</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;True Force&quot;</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>scatter!</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>,</span><span class='hljl-n'>force_data</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;Force Measurements&quot;</span><span class='hljl-p'>)</span>
</pre>


<img src=""  />

<p>Can we train a neural network to approximate the expected force at any location for this spring? To see whether this is possible with a standard neural network, let&#39;s just do it. Let&#39;s define a neural network to be <span class="math">$F(x)$</span> and see if we can learn the force function&#33;</p>


<pre class='hljl'>
<span class='hljl-n'>NNForce</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>Chain</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-oB'>-&gt;</span><span class='hljl-t'> </span><span class='hljl-p'>[</span><span class='hljl-n'>x</span><span class='hljl-p'>],</span><span class='hljl-t'>
           </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>1</span><span class='hljl-p'>,</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-n'>tanh</span><span class='hljl-p'>),</span><span class='hljl-t'>
           </span><span class='hljl-nf'>Dense</span><span class='hljl-p'>(</span><span class='hljl-ni'>32</span><span class='hljl-p'>,</span><span class='hljl-ni'>1</span><span class='hljl-p'>),</span><span class='hljl-t'>
           </span><span class='hljl-n'>first</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
Chain&#40;#21, Dense&#40;1, 32, tanh&#41;, Dense&#40;32, 1&#41;, first&#41;
</pre>


<p>Now our loss function will be to match the force at the &#40;position,force&#41; pairs in the dataset:</p>


<pre class='hljl'>
<span class='hljl-nf'>loss</span><span class='hljl-p'>()</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>sum</span><span class='hljl-p'>(</span><span class='hljl-n'>abs2</span><span class='hljl-p'>,</span><span class='hljl-nf'>NNForce</span><span class='hljl-p'>(</span><span class='hljl-n'>position_data</span><span class='hljl-p'>[</span><span class='hljl-n'>i</span><span class='hljl-p'>])</span><span class='hljl-t'> </span><span class='hljl-oB'>-</span><span class='hljl-t'> </span><span class='hljl-n'>force_data</span><span class='hljl-p'>[</span><span class='hljl-n'>i</span><span class='hljl-p'>]</span><span class='hljl-t'> </span><span class='hljl-k'>for</span><span class='hljl-t'> </span><span class='hljl-n'>i</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-t'> </span><span class='hljl-ni'>1</span><span class='hljl-oB'>:</span><span class='hljl-nf'>length</span><span class='hljl-p'>(</span><span class='hljl-n'>position_data</span><span class='hljl-p'>))</span><span class='hljl-t'>
</span><span class='hljl-nf'>loss</span><span class='hljl-p'>()</span>
</pre>


<pre class="output">
0.0010538533168833158
</pre>


<p>Our random parameters do not do so well, so let&#39;s train&#33;</p>


<pre class='hljl'>
<span class='hljl-n'>opt</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span><span class='hljl-oB'>.</span><span class='hljl-nf'>Descent</span><span class='hljl-p'>(</span><span class='hljl-nfB'>0.01</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>data</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>Iterators</span><span class='hljl-oB'>.</span><span class='hljl-nf'>repeated</span><span class='hljl-p'>((),</span><span class='hljl-t'> </span><span class='hljl-ni'>5000</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>iter</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-ni'>0</span><span class='hljl-t'>
</span><span class='hljl-n'>cb</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-k'>function</span><span class='hljl-t'> </span><span class='hljl-p'>()</span><span class='hljl-t'> </span><span class='hljl-cs'>#callback function to observe training</span><span class='hljl-t'>
  </span><span class='hljl-kd'>global</span><span class='hljl-t'> </span><span class='hljl-n'>iter</span><span class='hljl-t'> </span><span class='hljl-oB'>+=</span><span class='hljl-t'> </span><span class='hljl-ni'>1</span><span class='hljl-t'>
  </span><span class='hljl-k'>if</span><span class='hljl-t'> </span><span class='hljl-n'>iter</span><span class='hljl-t'> </span><span class='hljl-oB'>%</span><span class='hljl-t'> </span><span class='hljl-ni'>500</span><span class='hljl-t'> </span><span class='hljl-oB'>==</span><span class='hljl-t'> </span><span class='hljl-ni'>0</span><span class='hljl-t'>
    </span><span class='hljl-nf'>display</span><span class='hljl-p'>(</span><span class='hljl-nf'>loss</span><span class='hljl-p'>())</span><span class='hljl-t'>
  </span><span class='hljl-k'>end</span><span class='hljl-t'>
</span><span class='hljl-k'>end</span><span class='hljl-t'>
</span><span class='hljl-nf'>display</span><span class='hljl-p'>(</span><span class='hljl-nf'>loss</span><span class='hljl-p'>())</span><span class='hljl-t'>
</span><span class='hljl-n'>Flux</span><span class='hljl-oB'>.</span><span class='hljl-nf'>train!</span><span class='hljl-p'>(</span><span class='hljl-n'>loss</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span><span class='hljl-oB'>.</span><span class='hljl-nf'>params</span><span class='hljl-p'>(</span><span class='hljl-n'>NNForce</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-n'>data</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>opt</span><span class='hljl-p'>;</span><span class='hljl-t'> </span><span class='hljl-n'>cb</span><span class='hljl-oB'>=</span><span class='hljl-n'>cb</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
0.0010538533168833158
0.000836430782284745
0.000714624520713768
0.0006102105734944394
0.0005207373034993863
0.0004441050107670813
0.000378508653041744
0.00032239447679009264
0.00027442625204749966
0.00023344991699343048
0.000198472185221858
</pre>


<p>The neural network almost exactly matched the dataset, but how well did it actually learn the real force function? Let&#39;s plot it to see:</p>


<pre class='hljl'>
<span class='hljl-n'>learned_force_plot</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>NNForce</span><span class='hljl-oB'>.</span><span class='hljl-p'>(</span><span class='hljl-n'>positions_plot</span><span class='hljl-p'>)</span><span class='hljl-t'>

</span><span class='hljl-nf'>plot</span><span class='hljl-p'>(</span><span class='hljl-n'>plot_t</span><span class='hljl-p'>,</span><span class='hljl-n'>force_plot</span><span class='hljl-p'>,</span><span class='hljl-n'>xlabel</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;t&quot;</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;True Force&quot;</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>plot!</span><span class='hljl-p'>(</span><span class='hljl-n'>plot_t</span><span class='hljl-p'>,</span><span class='hljl-n'>learned_force_plot</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;Predicted Force&quot;</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>scatter!</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>,</span><span class='hljl-n'>force_data</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;Force Measurements&quot;</span><span class='hljl-p'>)</span>
</pre>


<img src=""  />

<p>Ouch. The problem is that a neural network can approximate any function, so it approximated <em>a</em> function that fits the data, but not <em>the correct</em> function. We somehow need to have more data... but where can we get more data?</p>
<p>Well, even a first year undergrad in physics will know Hooke&#39;s law, which is that the idealized spring should satisfy <span class="math">$F(x) = -kx$</span>. This is a decent assumption for the evolution of the system:</p>


<pre class='hljl'>
<span class='hljl-nf'>force2</span><span class='hljl-p'>(</span><span class='hljl-n'>dx</span><span class='hljl-p'>,</span><span class='hljl-n'>x</span><span class='hljl-p'>,</span><span class='hljl-n'>k</span><span class='hljl-p'>,</span><span class='hljl-n'>t</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-oB'>-</span><span class='hljl-n'>k</span><span class='hljl-oB'>*</span><span class='hljl-n'>x</span><span class='hljl-t'>
</span><span class='hljl-n'>prob_simplified</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>SecondOrderODEProblem</span><span class='hljl-p'>(</span><span class='hljl-n'>force2</span><span class='hljl-p'>,</span><span class='hljl-nfB'>1.0</span><span class='hljl-p'>,</span><span class='hljl-nfB'>0.0</span><span class='hljl-p'>,(</span><span class='hljl-nfB'>0.0</span><span class='hljl-p'>,</span><span class='hljl-nfB'>10.0</span><span class='hljl-p'>),</span><span class='hljl-n'>k</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>sol_simplified</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>solve</span><span class='hljl-p'>(</span><span class='hljl-n'>prob_simplified</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>plot</span><span class='hljl-p'>(</span><span class='hljl-n'>sol</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-p'>[</span><span class='hljl-s'>&quot;Velocity&quot;</span><span class='hljl-t'> </span><span class='hljl-s'>&quot;Position&quot;</span><span class='hljl-p'>])</span><span class='hljl-t'>
</span><span class='hljl-nf'>plot!</span><span class='hljl-p'>(</span><span class='hljl-n'>sol_simplified</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-p'>[</span><span class='hljl-s'>&quot;Velocity Simplified&quot;</span><span class='hljl-t'> </span><span class='hljl-s'>&quot;Position Simplified&quot;</span><span class='hljl-p'>])</span>
</pre>


<img src=""  />

<p>While it&#39;snot quite correct, and it definitely drifts near the end, it should be a useful non-data assumption that we can add to improve the fitting. So, assuming we know <span class="math">$k$</span> &#40;this lab you probably have done before&#33;&#41;, we can regularize this fitting by having a term that states our neural network should be the solution to the differential equation.</p>
<p>This term looks like what we had done before:</p>


<pre class='hljl'>
<span class='hljl-n'>random_positions</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-p'>[</span><span class='hljl-ni'>2</span><span class='hljl-nf'>rand</span><span class='hljl-p'>()</span><span class='hljl-oB'>-</span><span class='hljl-ni'>1</span><span class='hljl-t'> </span><span class='hljl-k'>for</span><span class='hljl-t'> </span><span class='hljl-n'>i</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-t'> </span><span class='hljl-ni'>1</span><span class='hljl-oB'>:</span><span class='hljl-ni'>100</span><span class='hljl-p'>]</span><span class='hljl-t'> </span><span class='hljl-cs'># random values in [-1,1]</span><span class='hljl-t'>
</span><span class='hljl-nf'>loss_ode</span><span class='hljl-p'>()</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>sum</span><span class='hljl-p'>(</span><span class='hljl-n'>abs2</span><span class='hljl-p'>,</span><span class='hljl-nf'>NNForce</span><span class='hljl-p'>(</span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-oB'>-</span><span class='hljl-t'> </span><span class='hljl-p'>(</span><span class='hljl-oB'>-</span><span class='hljl-n'>k</span><span class='hljl-oB'>*</span><span class='hljl-n'>x</span><span class='hljl-p'>)</span><span class='hljl-t'> </span><span class='hljl-k'>for</span><span class='hljl-t'> </span><span class='hljl-n'>x</span><span class='hljl-t'> </span><span class='hljl-kp'>in</span><span class='hljl-t'> </span><span class='hljl-n'>random_positions</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>loss_ode</span><span class='hljl-p'>()</span>
</pre>


<pre class="output">
6.440667340398036
</pre>


<p>If this term is zero, then <span class="math">$F(x) = -kx$</span>, which is approximately true. So now let&#39;s put these together:</p>


<pre class='hljl'>
<span class='hljl-n'>λ</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nfB'>0.1</span><span class='hljl-t'>
</span><span class='hljl-nf'>composed_loss</span><span class='hljl-p'>()</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-nf'>loss</span><span class='hljl-p'>()</span><span class='hljl-t'> </span><span class='hljl-oB'>+</span><span class='hljl-t'> </span><span class='hljl-n'>λ</span><span class='hljl-oB'>*</span><span class='hljl-nf'>loss_ode</span><span class='hljl-p'>()</span>
</pre>


<pre class="output">
composed_loss &#40;generic function with 1 method&#41;
</pre>


<p>where <span class="math">$λ$</span> is some weight factor to control the regularization against the physics assumption. Now we can train the physics-informed neural network:</p>


<pre class='hljl'>
<span class='hljl-n'>opt</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span><span class='hljl-oB'>.</span><span class='hljl-nf'>Descent</span><span class='hljl-p'>(</span><span class='hljl-nfB'>0.01</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>data</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>Iterators</span><span class='hljl-oB'>.</span><span class='hljl-nf'>repeated</span><span class='hljl-p'>((),</span><span class='hljl-t'> </span><span class='hljl-ni'>5000</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-n'>iter</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-ni'>0</span><span class='hljl-t'>
</span><span class='hljl-n'>cb</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-k'>function</span><span class='hljl-t'> </span><span class='hljl-p'>()</span><span class='hljl-t'> </span><span class='hljl-cs'>#callback function to observe training</span><span class='hljl-t'>
  </span><span class='hljl-kd'>global</span><span class='hljl-t'> </span><span class='hljl-n'>iter</span><span class='hljl-t'> </span><span class='hljl-oB'>+=</span><span class='hljl-t'> </span><span class='hljl-ni'>1</span><span class='hljl-t'>
  </span><span class='hljl-k'>if</span><span class='hljl-t'> </span><span class='hljl-n'>iter</span><span class='hljl-t'> </span><span class='hljl-oB'>%</span><span class='hljl-t'> </span><span class='hljl-ni'>500</span><span class='hljl-t'> </span><span class='hljl-oB'>==</span><span class='hljl-t'> </span><span class='hljl-ni'>0</span><span class='hljl-t'>
    </span><span class='hljl-nf'>display</span><span class='hljl-p'>(</span><span class='hljl-nf'>composed_loss</span><span class='hljl-p'>())</span><span class='hljl-t'>
  </span><span class='hljl-k'>end</span><span class='hljl-t'>
</span><span class='hljl-k'>end</span><span class='hljl-t'>
</span><span class='hljl-nf'>display</span><span class='hljl-p'>(</span><span class='hljl-nf'>composed_loss</span><span class='hljl-p'>())</span><span class='hljl-t'>
</span><span class='hljl-n'>Flux</span><span class='hljl-oB'>.</span><span class='hljl-nf'>train!</span><span class='hljl-p'>(</span><span class='hljl-n'>composed_loss</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>Flux</span><span class='hljl-oB'>.</span><span class='hljl-nf'>params</span><span class='hljl-p'>(</span><span class='hljl-n'>NNForce</span><span class='hljl-p'>),</span><span class='hljl-t'> </span><span class='hljl-n'>data</span><span class='hljl-p'>,</span><span class='hljl-t'> </span><span class='hljl-n'>opt</span><span class='hljl-p'>;</span><span class='hljl-t'> </span><span class='hljl-n'>cb</span><span class='hljl-oB'>=</span><span class='hljl-n'>cb</span><span class='hljl-p'>)</span><span class='hljl-t'>

</span><span class='hljl-n'>learned_force_plot</span><span class='hljl-t'> </span><span class='hljl-oB'>=</span><span class='hljl-t'> </span><span class='hljl-n'>NNForce</span><span class='hljl-oB'>.</span><span class='hljl-p'>(</span><span class='hljl-n'>positions_plot</span><span class='hljl-p'>)</span><span class='hljl-t'>

</span><span class='hljl-nf'>plot</span><span class='hljl-p'>(</span><span class='hljl-n'>plot_t</span><span class='hljl-p'>,</span><span class='hljl-n'>force_plot</span><span class='hljl-p'>,</span><span class='hljl-n'>xlabel</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;t&quot;</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;True Force&quot;</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>plot!</span><span class='hljl-p'>(</span><span class='hljl-n'>plot_t</span><span class='hljl-p'>,</span><span class='hljl-n'>learned_force_plot</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;Predicted Force&quot;</span><span class='hljl-p'>)</span><span class='hljl-t'>
</span><span class='hljl-nf'>scatter!</span><span class='hljl-p'>(</span><span class='hljl-n'>t</span><span class='hljl-p'>,</span><span class='hljl-n'>force_data</span><span class='hljl-p'>,</span><span class='hljl-n'>label</span><span class='hljl-oB'>=</span><span class='hljl-s'>&quot;Force Measurements&quot;</span><span class='hljl-p'>)</span>
</pre>


<pre class="output">
0.6442652062250255
0.0005589969520268303
0.0005260149554856507
0.000496891015008568
0.00047094084847322613
0.00044765155822109714
0.00042661564278918785
0.0004075182673638757
0.0003900953975468654
0.0003741329592867036
0.000359452998884364
</pre>

<img src=""  />

<p>And there we go: we have used knowledge of physics to help inform our neural network training process&#33;</p>
<h2>Conclusion</h2>
<p>In this lecture we motivated machine learning not as a process of predicting from data but as a process for learning arbitrary nonlinear functions. Neural networks were just one choice of possible function. We then demonstrated how differential equations could be solved using this function approximation technique and then put together these two domains, solving differential equations and approximating data, into a single process to allow for physical knowledge to be embedded into the training process of a neural network, thus arriving at a physics-informed neural network. This is just one method in scientific machine learning which we will be exploring in more detail, demonstrating how we can utilize scientific knowledge to improve fits and allow for data-efficient machine learning.</p>


        <HR/>
        <div class="footer">
          <p>
            Published from <a href="sciml.jmd">sciml.jmd</a>
            using <a href="http://github.com/JunoLab/Weave.jl">Weave.jl</a> v0.10.3 on 2020-09-10.
          </p>
        </div>
      </div>
    </div>
  </div>
</BODY>

</HTML>
